Install packages we will need.
# install.packages("ggplot2")
# install.packages("sp")
# install.packages("raster")
# install.packages("rgdal")
# install.packages("brew")
# install.packages("systemfonts")
# install.packages("mapview")
# install.packages("spatstat")
# install.packages("tmaptools")
# install.packages("elevatr")
# install.packages("rgbif")
# install.packages("remotes")
# remotes::install_github("Nowosad/spDataLarge")
# install.packages("spData")
# install.packages("gstat")
# install.packages("fields")
# install.packages("foreign")
# install.packages("velox")
What is a geographic information system (GIS)? While often thought of as means to making maps, the more basic utility of a GIS is the ability to overlay diverse spatial data layers. This is something that flat maps didn’t allow us to do, and is best gotten across with an example. Let’s start with tabular data on the point locations of detention basins (boring data that provides a great example!). In the process of working with this data you will be introduced to several topics (coordinate systems, vector and raster data) that we treat in more depth later.
Let’s first consider coordinates. Web services like Google Maps have locations across much of the world mapped to lat-long coordinates. When these locations are polygons (e.g. Athens-Clarke County), Google Maps has an associated centroid (the approximate geographic center of the polygon) and bounding box. So we can actually retrieve the coordinates of named locations by geocoding, basically an API call. Since Google now charges, we’ll use OpenStreetMap instead :)
library(tmaptools)
acc<-geocode_OSM("Athens-Clarke County")
acc$coords
## x y
## -83.35578 33.94587
acc$bbox
## xmin ymin xmax ymax
## -83.53747 33.84802 -83.24083 34.03947
I have a bunch of herbarium data on the occurrences of rare plants associated with granite outcrops in the Southeast.
Unfortunately, until recently, botanists collecting these specimens did not report coordinates. Nevertheless, many times location names can be used to generate coordinates. Many names are not unique, but usually are if combined with county and state.
Let’s see if we can get correct coordinates for Echols Mill (now quarried) nearby in Oglethorpe county using just the place name.
em1<-geocode_OSM("Echols Mill")
em2<-geocode_OSM("Echols Mill, Oglethorpe, Georgia")
When we don’t supply more detail, we get the centroid of Echols county in south Georgia on the Florida line. But when we do, we get the Echols Mill I had in mind.
Note: You can tell by the difference in coordinates. Or use JPS suggestion below:
If you’re geocoding, you will definitely want to doublecheck your coordinates by overlaying them on a map of the region of interest to make sure the API is making the right associations! Below, we’ll see how to overlay points on polygons to do a spatial join to collect the info you would need to check against for accuracy.
To illustrate, let’s bring in a .csv that contains the point locations of detention basins in Sandy Springs, which is near Atlanta in Fulton county (http://www.fultoncountyga.gov/fcgis-geospatial-data).
# Bring in Sandy Springs detention basin points from Fulton County GIS website
dbasins<-read.csv("Stormwater_Basins__Ponds__Hosted___Sandy_Springs_Georgia.csv")
# Take a look at the data
head(dbasins)
## X Y OBJECTID BasinType BasinSurf DateCreated ConstructType
## 1 -84.35676 33.88007 2801 Wet Pond Surface 1996 Riser
## 2 -84.35658 33.88119 2802 Dry Pond Surface 2007 Weir Wall
## 3 -84.35635 33.88207 2803 Dry Pond Surface 1989 Weir Wall
## 4 -84.35419 33.88468 2804 Dry Pond Surface 2001 Riser
## 5 -84.35284 33.88555 2805 Dry Pond Surface 1989 Weir Wall
## 6 -84.35674 33.88303 2806 Wet Pond Surface 1999 Riser
## DamHeight
## 1 8
## 2 5
## 3 6
## 4 10
## 5 3
## 6 5
Notice that there is an X and Y column giving the longitude and latitude of the location of each pond. But also a series of attributes for each pond, both categorical and continuous.
# What are the categories?
unique(dbasins$BasinType)
## [1] Wet Pond Dry Pond Other BMP
## Levels: Dry Pond Other BMP Wet Pond
unique(dbasins$BasinSurf)
## [1] Surface Underground Vault Parking Lot N/A
## Levels: N/A Parking Lot Surface Underground Vault
unique(dbasins$ConstructType)
## [1] Riser Weir Wall Stand Pipe Concrete Flume Earthen Flume
## [6] N/A
## Levels: Concrete Flume Earthen Flume N/A Riser Stand Pipe Weir Wall
We can explore these attributes without considering space. How many ponds by type, whether surface or underground, and construction type?
# Aggregate data by categories
dbtypes<-with(dbasins,table(BasinType,BasinSurf,ConstructType))
dbtypes<-as.data.frame(dbtypes)
head(dbtypes)
## BasinType BasinSurf ConstructType Freq
## 1 Dry Pond N/A Concrete Flume 0
## 2 Other BMP N/A Concrete Flume 0
## 3 Wet Pond N/A Concrete Flume 0
## 4 Dry Pond Parking Lot Concrete Flume 0
## 5 Other BMP Parking Lot Concrete Flume 0
## 6 Wet Pond Parking Lot Concrete Flume 0
We can also graph non-spatial summaries of the attribute data. First by categories.
# Load graphics package
library(ggplot2)
# Make barplot of frequencies across categories
ggplot(data=dbtypes, aes(y=Freq,x=BasinType,fill=ConstructType)) +
geom_bar( stat="identity") +
facet_grid(~BasinSurf) +
labs(y = "Count") +
labs(x = "") +
theme(legend.position="bottom")
Now by categorical and continuous fields.
# Get age
dbasins$Age<-2019-dbasins$DateCreated
# Show age range by construction type
ggplot(data=dbasins, aes(y=Age,x=ConstructType,fill=ConstructType)) +
geom_violin() + coord_flip() +
theme(legend.position="") +
labs(x = "")
## Warning: Removed 30 rows containing non-finite values (stat_ydensity).
To make use of the spatial component of the data in R, we need to convert the table to a spatial points dataframe (SPDF). I discuss data formats in more detail in the zoom recording.
# load sp package
library(sp)
# First let's look at arguments for the command SpatialPointsDataFrame
?SpatialPointsDataFrame
To convert to SPDF, we need the CORRECT descripton of the coordinate system.
# In this case the coordinates are lat-long so the value of the argument is
# CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
dbasin_pts<-SpatialPointsDataFrame(dbasins[,1:2],
dbasins,
coords.nrs = numeric(0),
proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
Now let’s take a quick look to make sure the points seem okay. The plot shows basically the shape of north Fulton county which is where the points should be.
plot(dbasin_pts)
Let’s look at the description of the SPDF. Notice that we now have points that can be displayed in 2D space that also have attributes linked to them.
dbasin_pts
## class : SpatialPointsDataFrame
## features : 1261
## extent : -84.44169, -84.25951, 33.87834, 34.00883 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 9
## names : X, Y, OBJECTID, BasinType, BasinSurf, DateCreated, ConstructType, DamHeight, Age
## min values : -84.44169472, 33.87833621, 2801, Dry Pond, N/A, 1950, Concrete Flume, 0, 1
## max values : -84.25950623, 34.00882877, 11812, Wet Pond, Underground Vault, 2018, Weir Wall, 999, 69
Similar to the geocoding example above, with these spatial points, we can make API calls through R packages to acquire, for example, elevation data from the USGS Elevation Point Query Service.
library(elevatr)
dbasins_subset<-dbasin_pts[1:10,]
dbasins_subset<-get_elev_point(dbasins_subset)
## Note: Elevation units are in &units=Meters
## Note:. The coordinate reference system is:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
dbasins_subset
## class : SpatialPointsDataFrame
## features : 10
## extent : -84.35676, -84.34985, 33.88007, 33.88555 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 11
## names : X, Y, OBJECTID, BasinType, BasinSurf, DateCreated, ConstructType, DamHeight, Age, elevation, elev_units
## min values : -84.35675673, 33.88007369, 2801, Dry Pond, Surface, 1989, Riser, 3, 12, 255.8, meters
## max values : -84.34984609, 33.88555104, 2810, Wet Pond, Underground Vault, 2007, Weir Wall, 10, 30, 283.14, meters
Now let’s bring in another layer. A shapefile of subwatershed polygons from USGS (U.S. Geological Survey, National Geospatial Program, 20200505, USGS National Hydrography Dataset Best Resolution (NHD) for Hydrologic Unit (HU) 8 - 03130001 (published 20200505): U.S. Geological Survey.). I downloaded this data using the National Map viewer website (https://viewer.nationalmap.gov/advanced-viewer/).
Note that the shapefile has become the standard format in which vector GIS data is usually served. Although developed originally by ESRI, it has become non-proprietary. IMPORTANT to note that a single shapefile is really a set of linked files (.shp, .dbf, .shx, .prj) that are all required, but are all called by using “filename.shp”.
# Load raster package
library(raster)
# Bring in the 12 digit hydrologic units (HUC)
hucs12<-shapefile("Hydrologic_Units_Subwatershed.shp")
What is the coordinate system of the hucs? Is it also lat-long (aka unprojected, aka geographic)? How can we find out?
hucs12
## class : SpatialPolygonsDataFrame
## features : 759
## extent : -85.83745, -82.91979, 32.33565, 34.87591 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 21
## names : OBJECTID, HUC_8, HUC_10, HUC_12, ACRES, NCONTRB_A, HU_10_GNIS, HU_12_GNIS, HU_10_DS, HU_10_NAME, HU_10_MOD, HU_10_TYPE, HU_12_DS, HU_12_NAME, HU_12_MOD, ...
## min values : 1, 03070101, 0307010101, 030701010101, 10055, 0, NA, NA, 0307010103, Amicalola Creek, DM, S, 030701010102, Acorn Creek-Chattahoochee River, DM, ...
## max values : 99, 03150108, 0315010810, 031501081006, 9963, 0, NA, NA, 0315010901, Yellowjacket Creek, UA, S, 031600020808, Zachry Creek-Chattahoochee River, UA, ...
So, we see that hucs12 includes long/lat data (as described by the crs value).
Now let’s overlay these layers to browse the data in R. You can zoom in or out and clip on points or polygons to see the attributes.
# Load mapview package to interactively view layers
library(mapview)
# View
mapview(dbasin_pts) + mapview(hucs12)
## Warning: sf layer has inconsistent datum (+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs).
## Need '+proj=longlat +datum=WGS84'
## Warning: sf layer has inconsistent datum (+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs).
## Need '+proj=longlat +datum=WGS84'